Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 140
Filtrar
1.
bioRxiv ; 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38659906

RESUMO

Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.

2.
bioRxiv ; 2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37873367

RESUMO

Background: The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results: Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions: These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .

3.
Nature ; 621(7978): 355-364, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37612510

RESUMO

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Assuntos
Cromossomos Humanos Y , Evolução Molecular , Humanos , Masculino , Cromossomos Humanos Y/genética , Genoma Humano/genética , Genômica , Taxa de Mutação , Fenótipo , Eucromatina/genética , Pseudogenes , Variação Genética/genética , Cromossomos Humanos X/genética , Regiões Pseudoautossômicas/genética
4.
Genome Med ; 15(1): 47, 2023 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-37420249

RESUMO

BACKGROUND: Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS: We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS: We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS: Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine.


Assuntos
Mutação , Neoplasias , Neoplasias/genética , Neoplasias/patologia , Humanos , Aprendizado Profundo , Benchmarking
5.
Genome Res ; 33(4): 496-510, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37164484

RESUMO

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.


Assuntos
DNA Satélite , Polimorfismo Genético , Humanos , DNA Satélite/genética , Haplótipos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
6.
Genome Biol ; 24(1): 100, 2023 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-37122002

RESUMO

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.


Assuntos
Genoma Humano , Polimorfismo Genético , Humanos , Variação Estrutural do Genoma , Inversão Cromossômica
7.
Cell Genom ; 3(4): 100281, 2023 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-37082141

RESUMO

Cancer genomes harbor a broad spectrum of structural variants (SVs) driving tumorigenesis, a relevant subset of which escape discovery using short-read sequencing. We employed Oxford Nanopore Technologies (ONT) long-read sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assembled complex rearrangements, including a 1.55-Mbp chromothripsis event, and we uncover a complex SV pattern termed templated insertion (TI) thread, characterized by short (mostly <1 kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50 kbp in size. TI threads occur in 3% of cancers, with a prevalence up to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read-based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in cancer-driver genes. Our study shows the advantage of long-read sequencing in the discovery and characterization of complex somatic rearrangements.

13.
Haematologica ; 108(2): 543-554, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-35522148

RESUMO

Histone methylation-modifiers, such as EZH2 and KMT2D, are recurrently altered in B-cell lymphomas. To comprehensively describe the landscape of alterations affecting genes encoding histone methylation-modifiers in lymphomagenesis we investigated whole genome and transcriptome data of 186 mature B-cell lymphomas sequenced in the ICGC MMML-Seq project. Besides confirming common alterations of KMT2D (47% of cases), EZH2 (17%), SETD1B (5%), PRDM9 (4%), KMT2C (4%), and SETD2 (4%), also identified by prior exome or RNA-sequencing studies, we here found recurrent alterations to KDM4C in chromosome 9p24, encoding a histone demethylase. Focal structural variation was the main mechanism of KDM4C alterations, and was independent from 9p24 amplification. We also identified KDM4C alterations in lymphoma cell lines including a focal homozygous deletion in a classical Hodgkin lymphoma cell line. By integrating RNA-sequencing and genome sequencing data we predict that KDM4C structural variants result in loss-offunction. By functional reconstitution studies in cell lines, we provide evidence that KDM4C can act as a tumor suppressor. Thus, we show that identification of structural variants in whole genome sequencing data adds to the comprehensive description of the mutational landscape of lymphomas and, moreover, establish KDM4C as a putative tumor suppressive gene recurrently altered in subsets of B-cell derived lymphomas.


Assuntos
Linfoma de Células B , Linfoma , Humanos , Histonas/metabolismo , Histona Desmetilases/genética , Homozigoto , Deleção de Sequência , Linfoma/genética , Linfoma de Células B/genética , Sequenciamento Completo do Genoma , RNA , Histona Desmetilases com o Domínio Jumonji/genética , Histona Desmetilases com o Domínio Jumonji/química , Histona Desmetilases com o Domínio Jumonji/metabolismo , Histona-Lisina N-Metiltransferase/genética
15.
Nat Biotechnol ; 41(6): 832-844, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36424487

RESUMO

Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations.


Assuntos
Cromotripsia , Leucemia , Neoplasias , Humanos , Neoplasias/genética , Leucemia/genética , Rearranjo Gênico , Linhagem Celular , Variação Estrutural do Genoma
16.
Nature ; 611(7936): 519-531, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética
17.
Genome Res ; 32(10): 1941-1951, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36180231

RESUMO

Gibbons are the most speciose family of living apes, characterized by a diverse chromosome number and rapid rate of large-scale rearrangements. Here we performed single-cell template strand sequencing (Strand-seq), molecular cytogenetics, and deep in silico analysis of a southern white-cheeked gibbon genome, providing the first comprehensive map of 238 previously hidden small-scale inversions. We determined that more than half are gibbon specific, at least fivefold higher than shown for other primate lineage-specific inversions, with a significantly high number of small heterozygous inversions, suggesting that accelerated evolution of inversions may have played a role in the high sympatric diversity of gibbons. Although the precise mechanisms underlying these inversions are not yet understood, it is clear that segmental duplication-mediated NAHR only accounts for a small fraction of events. Several genomic features, including gene density and repeat (e.g., LINE-1) content, might render these regions more break-prone and susceptible to inversion formation. In the attempt to characterize interspecific variation between southern and northern white-cheeked gibbons, we identify several large assembly errors in the current GGSC Nleu3.0/nomLeu3 reference genome comprising more than 49 megabases of DNA. Finally, we provide a list of 182 candidate genes potentially involved in gibbon diversification and speciation.


Assuntos
Hominidae , Hylobates , Animais , Hylobates/genética , Genoma , Primatas/genética , Inversão Cromossômica/genética , Cromossomos , Hominidae/genética
18.
Annu Rev Genomics Hum Genet ; 23: 123-152, 2022 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-35655332

RESUMO

Somatic rearrangements resulting in genomic structural variation drive malignant phenotypes by altering the expression or function of cancer genes. Pan-cancer studies have revealed that structural variants (SVs) are the predominant class of driver mutation in most cancer types, but because they are difficult to discover, they remain understudied when compared with point mutations. This review provides an overview of the current knowledge of somatic SVs, discussing their primary roles, prevalence in different contexts, and mutational mechanisms. SVs arise throughout the life history of cancer, and 55% of driver mutations uncovered by the Pan-Cancer Analysis of Whole Genomes project represent SVs. Leveraging the convergence of cell biology and genomics, we propose a mechanistic classification of somatic SVs, from simple to highly complex DNA rearrangement classes. The actions of DNA repair and DNA replication processes together with mitotic errors result in a rich spectrum of SV formation processes, with cascading effects mediating extensive structural diversity after an initiating DNA lesion has formed. Thanks to new sequencing technologies, including the sequencing of single-cell genomes, open questions about the molecular triggers and the biomolecules involved in SV formation as well as their mutational rates can now be addressed.


Assuntos
Variação Estrutural do Genoma , Neoplasias , Genoma Humano , Genômica , Humanos , Mutação , Neoplasias/epidemiologia , Neoplasias/genética , Neoplasias/patologia , Prevalência
19.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35525246

RESUMO

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Assuntos
Inversão Cromossômica , Duplicações Segmentares Genômicas , Inversão Cromossômica/genética , Variações do Número de Cópias de DNA/genética , Genoma Humano , Genômica , Humanos
20.
Leukemia ; 36(7): 1759-1768, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35585141

RESUMO

The mechanisms underlying T-ALL relapse remain essentially unknown. Multilevel-omics in 38 matched pairs of initial and relapsed T-ALL revealed 18 (47%) type-1 (defined by being derived from the major ancestral clone) and 20 (53%) type-2 relapses (derived from a minor ancestral clone). In both types of relapse, we observed known and novel drivers of multidrug resistance including MDR1 and MVP, NT5C2 and JAK-STAT activators. Patients with type-1 relapses were specifically characterized by IL7R upregulation. In remarkable contrast, type-2 relapses demonstrated (1) enrichment of constitutional cancer predisposition gene mutations, (2) divergent genetic and epigenetic remodeling, and (3) enrichment of somatic hypermutator phenotypes, related to BLM, BUB1B/PMS2 and TP53 mutations. T-ALLs that later progressed to type-2 relapses exhibited a complex subclonal architecture, unexpectedly, already at the time of initial diagnosis. Deconvolution analysis of ATAC-Seq profiles showed that T-ALLs later developing into type-1 relapses resembled a predominant immature thymic T-cell population, whereas T-ALLs developing into type-2 relapses resembled a mixture of normal T-cell precursors. In sum, our analyses revealed fundamentally different mechanisms driving either type-1 or type-2 T-ALL relapse and indicate that differential capacities of disease evolution are already inherent to the molecular setup of the initial leukemia.


Assuntos
Leucemia-Linfoma Linfoblástico de Células T Precursoras , Criança , Evolução Clonal/genética , Humanos , Mutação , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/metabolismo , Recidiva
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...